A Mostly Data-Driven Approach to Inverse Text Normalization
نویسندگان
چکیده
For an automatic speech recognition system to produce sensibly formatted, readable output, the spoken-form token sequence produced by the core speech recognizer must be converted to a written-form string. This process is known as inverse text normalization (ITN). Here we present a mostly data-driven ITN system that leverages a set of simple rules and a few handcrafted grammars to cast ITN as a labeling problem. To this labeling problem, we apply a compact bi-directional LSTM. We show that the approach performs well using practical amounts of training data.
منابع مشابه
A language-modeling approach to inverse text normalization and data cleanup for multimodal voice search applications
In this paper we address two related challenges in multimodal local search applications on mobile devices: first, correctly displaying the business names, and second, harvesting language model training data from an inconsistently labeled corpus. We investigate the impact of common text normalization and the quality of language model training corpus on the accuracy of displayed results. We propo...
متن کاملDeveloping EOP materials for Pre-service Cabin Crew: A text-driven approach
One prominent criterion to achieve efficient learning and instruction in an educational setting is the appropriate material(s) specifically developed for that particular group of learners, particularly in an English for Occupational Purposes (EOP) context. This study aimed at developing new EOP materials for pre-service cabin crew in an aviation school. To do so, initially the researchers perfo...
متن کاملA Sociolinguistic Scrutiny of the Great Gatsby and its Persian Translation in Light of Hatim and Mason’s Framework
Translation studies essentially deals with a socio-communicatively driven and contextualized enterprise. Viewed hence, it seems that no discipline tends to provide the possibility of studying the interrelations between interlocutors to generate meaning within the interactive social context as precisely as sociolinguistics (Federici, 2018). A sociolinguistic approach to translation seems to be i...
متن کاملExtracting Temporal Information from Open Domain Text: A Comparative Exploration
The utility of data-driven techniques in the end-to-end problem of temporal information extraction is unclear. Recognition of temporal expressions yields readily to machine learning, but normalization seems to call for a rule-based approach. We explore two aspects of the (potential) utility of data-driven methods in the temporal information extraction task. First, we look at whether improving r...
متن کاملThe Calculation of the output price vectorby applying reverse linear programming: The novel approach in DEA
In the today’s world wherein every routine is based on economic factors, there is no doubt that theoretical sciences are driven by their capabilities and affordances in terms of economy. As a mathematical tool, data envelopment analysis (DEA) is provided to economics, so that one can investigate associated costs, prices and revenues of economic units. Data Envelopment Analysis (DEA) is a linear...
متن کامل